Knowledge Discovery from Data Streams
نویسندگان
چکیده
Traditional pratice in machine learning algorithms involve fixed data sets and static models. Most of the times, all the data is loaded into memory and the learning task is solved by performing multiple scans over the training data. These assumptions fail with the advent of new application areas, like ubiquitous computing, sensor networks, e-commerce, etc., where data flows continuously, eventually at high speed rate. Other examples include scientific data, customer click streams, telephone records, large sets of web pages, multimedia data, sets of retail chain transactions, etc. These sources of continuous data are called data streams. Data streams are increasingly important in the research community, as new algorithms are needed to process this streaming data in reasonable time. Learning from data streams require algorithms that process examples in constant time and memory, usualy scaning data once. Moreover, if the process is not strictly stationary (as most of real world applications), the target concept could gradually change over time. This is an incremental task that requires incremental learning algorithms that take drift into account. Many researchers coming from different areas (data mining, machine learning, OLAP, databases, etc.) are designing new approaches or adapting some of the traditional algorithms to data streams. The number of researchers in this field is also growing considerably, and, in many conferences, data streams are becoming a consolidated topic. For this special issue of Intelligent Data Analysis we selected 4 papers from the accepted papers for the Fourth International Workshop on Knowledge Discovery from Data Streams, an associated workshop of the 17th European Conference on Machine Learning (ECML) and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), co-located in Berlin, Germany, 2006. The selected papers cover a large spectrum in the research of Knowledge Discovery from Data Streams that goes from recommendation algorithms, clustering, drifting concepts and frequent pattern mining. The common concept in all the papers is that learning occurs while data continuously flows. In the first paper, Schema Matching on Streams with Accuracy Guarantees by S. Jaroszewicz, L. Ivantysynova, and T. Scheffer, address the problem of matching imperfectly documented schemas from data streams. The paper Modeling Dynamic Substate Chains among Massive States by V. Nguyen, and T. Washio, proposes a framework for handling high-dimensional data from large-scale transactional data wharehouses. The paper Mining Frequent Items in a Stream Using Flexible Windows by T. Calders, N.
منابع مشابه
Scalable Maintenance of Knowledge Discovery in an Ontology Stream
In dynamic settings where data is exposed by streams, knowledge discovery aims at learning associations of data across streams. In the semantic Web, streams expose their meaning through evolutive versions of ontologies. Such settings pose challenges of scalability for discovering (a posteriori) knowledge. In our work, the semantics, identifying knowledge similarity and rarity in streams, togeth...
متن کاملKnowledge Discovery in Data Mining and Massive Data Mining
Knowledge discovery is a process of non trivial extraction of previously unknown and presently useful information. The rapid advancement of the technology resulted in the increasing rate of data distributions. The data generated from mobile applications, sensor applications, network monitoring, traffic management, weblogs etc. can be referred as a data stream. The data streams are massive in na...
متن کاملSentiment Knowledge Discovery in Twitter Streaming Data
Micro-blogs are a challenging new source of information for data mining techniques. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. We briefly discuss the challenges that Twitter data streams pose, focusing ...
متن کاملTechnical Outline: Knowledge Discovery
• Mining Uncertain Heterogenous Information Networks Scalable and intelligent methods to handle the challenges of uncertainty, heterogeneity, and large volumes and/or low bandwidth, focused on three major data sources in military. network + spatial-temporal data + streams. • Domain Integration for Decision Support Integrate knowledge learned from networks, spatial-temporal data and streams all ...
متن کاملEntity Linking and Knowledge Discovery in Microblogs
Social media platforms have become significantly popular and are widely used for various customer services and communication. As a result, they experience a real-time emergence of new entities, ranging from product launches to trending mentions of celebrities. On the other hand, a Knowledge Base (KB) is used to represent entities of interest/relevance for general public, however, unlikely to co...
متن کامل